Learning Multiple Models for Reward Maximization

نویسندگان

Dani Goldberg

Maja J. Mataric

چکیده

We present an approach to reward maximiza-tion in a non-stationary mobile robot environment. The approach works within the realistic constraints of limited local sensing and limited a priori knowledge of the environment. It is based on the use of augmented Markov models (AMMs), a general modeling tool we have developed. AMMs are essentially Markov chains having additional statistics associated with states and state transitions. We have developed an algorithm that constructs AMMs on-line and in real-time with little computational and space overhead, making it practical to learn multiple models of the interaction dynamics between a robot and its environment during the execution of a task. For the purposes of reward maximiza-tion in a non-stationary environment, these models monitor events at increasing intervals of time and provide statistics used to discard redundant or outdated information while reducing the probability of conforming to noise. We have successfully implemented this approach with a physical mobile robot performing a mine collection task. In the context of this task, we rst present experimental results validating our reward max-imization criterion in a stationary environment. We then incorporate our algorithm for redundant/outdated information reduction using multiple models and apply the approach to a non-stationary environment with an abrupt change. Finally, we apply the technique to a simulated version of the task with a gradually shifting environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inverse Reinforce Learning with Nonparametric Behavior Clustering

Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstration...

متن کامل

The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction

The aim of the Cyber Rodent project is to understand the origins of our reward and affective systems by building artificial agents that share the same intrinsic constraints as natural agents: Self-preservation and self-reproduction. A Cyber Rodent is a robot that can search for and recharge from battery packs on the floor and copy its programs to a nearby agent through its infrared communicatio...

متن کامل

Selection Criteria for Neuromanifolds of Stochastic Dynamics

We present ways of defining neuromanifolds – models of stochastic matrices – that are compatible with the maximization of an objective function such as the expected reward in reinforcement learning theory. Our approach is based on information geometry and aims at the reduction of model parameters with the hope to improve gradient learning processes.

متن کامل

A Biologically Plausible 3-factor Learning Rule for Expectation Maximization in Reinforcement Learning and Decision Making

One of the most frequent problems in both decision making and reinforcement learning (RL) is expectation maximization involving functionals such as reward or utility. Generally, these problems consist of computing the optimal solution of a density function. Instead of trying to find this exact solution, a common approach is to approximate it through a learning process. In this work we propose a...

متن کامل

Inverse Reinforcement Learning with Locally Consistent Reward Functions

Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated trajectory to be produced by only a single reward function. This paper presents a novel generalization of the IRL problem that allows each trajectory to be generated by multiple locally consistent reward functions, hence catering to more realistic and complex experts’ behaviors. Solving our generali...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

Learning Multiple Models for Reward Maximization

نویسندگان

چکیده

منابع مشابه

Inverse Reinforce Learning with Nonparametric Behavior Clustering

The Cyber Rodent Project: Exploration of Adaptive Mechanisms for Self-Preservation and Self-Reproduction

Selection Criteria for Neuromanifolds of Stochastic Dynamics

A Biologically Plausible 3-factor Learning Rule for Expectation Maximization in Reinforcement Learning and Decision Making

Inverse Reinforcement Learning with Locally Consistent Reward Functions

عنوان ژورنال:

اشتراک گذاری